{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b95343ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os #导入Python的os模块，用于访问与操作系统交互的功能。\n",
    "from dotenv import load_dotenv  #从dotenv包中导入load_dotenv函数，用于加载.env文件中的环境变量。\n",
    "load_dotenv(override=True) #加载环境变量，如果环境变量已存在，则覆盖它们。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17b92fb6",
   "metadata": {},
   "source": [
    "这行是Jupyter Notebook的魔法命令（以!开头），用于通过uv（可能是某个特定工具的别名）安装或更新Python包。-qU表示静默更新，-i指定了包的源为清华大学的镜像。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "14fe6925",
   "metadata": {},
   "outputs": [],
   "source": [
    "! uv add -qU langchain-community langchain-experimental pandas -i https://pypi.tuna.tsinghua.edu.cn/simple"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54162148",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 7043 entries, 0 to 7042\n",
      "Data columns (total 21 columns):\n",
      " #   Column            Non-Null Count  Dtype  \n",
      "---  ------            --------------  -----  \n",
      " 0   customerID        7043 non-null   object \n",
      " 1   gender            7043 non-null   object \n",
      " 2   SeniorCitizen     7043 non-null   int64  \n",
      " 3   Partner           7043 non-null   object \n",
      " 4   Dependents        7043 non-null   object \n",
      " 5   tenure            7043 non-null   int64  \n",
      " 6   PhoneService      7043 non-null   object \n",
      " 7   MultipleLines     7043 non-null   object \n",
      " 8   InternetService   7043 non-null   object \n",
      " 9   OnlineSecurity    7043 non-null   object \n",
      " 10  OnlineBackup      7043 non-null   object \n",
      " 11  DeviceProtection  7043 non-null   object \n",
      " 12  TechSupport       7043 non-null   object \n",
      " 13  StreamingTV       7043 non-null   object \n",
      " 14  StreamingMovies   7043 non-null   object \n",
      " 15  Contract          7043 non-null   object \n",
      " 16  PaperlessBilling  7043 non-null   object \n",
      " 17  PaymentMethod     7043 non-null   object \n",
      " 18  MonthlyCharges    7043 non-null   float64\n",
      " 19  TotalCharges      7043 non-null   object \n",
      " 20  Churn             7043 non-null   object \n",
      "dtypes: float64(1), int64(2), object(18)\n",
      "memory usage: 1.1+ MB\n"
     ]
    }
   ],
   "source": [
    "import numpy as np #导入NumPy库，并简写为np，用于高效的数值计算\n",
    "import pandas as pd  #导入Pandas库，并简写为pd，用于数据处理和分析。\n",
    "# Load the dataset\n",
    "# The dataset is available at https://www.kaggle.com/datasets/blastchar/telco-customer-churn\n",
    "dataset = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')\n",
    "# Display 5 rows × 21 columns\n",
    "pd.set_option('max_colwidth',200) #设置Pandas显示选项，列宽最大为200字符。\n",
    "dataset.head(5) #显示数据集的前5行，以便快速查看数据的结构和内容。\n",
    "dataset.info()  #显示数据集的信息，包括每列的数据类型和非空值的数量，以便了解数据的基本结构。\n",
    "dataset.describe()  #显示数据集的统计信息，包括数值型列的计数、均值、标准差、最小值、最大值等，以便了解数据的分布情况。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "567b1f53",
   "metadata": {},
   "source": [
    "其中数据集各字段解释如下：\n",
    "|字段|解释|\n",
    "|---|---|\n",
    "|customerID|用户ID|\n",
    "|gender|性别|\n",
    "|SeniorCitizen|是否是老年人（1代表是）|\n",
    "|Partner|是否有配偶（Yes or No）|\n",
    "|Dependents|是否经济独立（Yes or No）|\n",
    "|tenure|用户入网时间|\n",
    "|PhoneService|是否开通电话业务（Yes or No）|\n",
    "|MultipleLines|是否开通多条电话业务（Yes 、 No or No phoneservice）|\n",
    "|InternetService|是否开通互联网服务（No、DSL数字网络或filber potic光线网络）|\n",
    "|OnlineSecurity|是否开通网络安全服务（Yes、No or No internetservice）|\n",
    "|OnlineBackup|是否开通在线备份服务（Yes、No or No internetservice）|\n",
    "|DeviceProtection|是否开通设备保护服务（Yes、No or No internetservice）\n",
    "|TechSupport|是否开通技术支持业务（Yes、No or No internetservice）|\n",
    "|StreamingTV|是否开通网络电视（Yes、No or No internetservice）|\n",
    "|StreamingMovies|是否开通网络电影（Yes、No or No internetservice）|\n",
    "|Contract|合同签订方式（按月、按年或者两年）|\n",
    "|PaperlessBilling|是否开通电子账单（Yes or No）|\n",
    "|PaymentMethod|付款方式（bank transfer、credit card、electronic check、mailed check）|\n",
    "|MonthlyCharges|月度费用|\n",
    "|TotalCharges|总费用|\n",
    "|Churn|是否流失（Yes or No）|"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbe3d994",
   "metadata": {},
   "source": [
    "- 测试LangChain内置代码解释器工具功能\n",
    "\n",
    "重新加载数据集到df变量。\n",
    "创建一个PythonAstREPLTool工具实例，该工具可以执行Python代码，且能访问局部变量df。\n",
    "使用tool.invoke方法执行代码，计算SeniorCitizen字段的平均值。\n",
    "直接计算MonthlyCharges字段的平均值（此行代码的结果未被使用）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3daba10e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(0.1621468124378816)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from langchain_core.prompts import ChatPromptTemplate\n",
    "from langchain_experimental.tools import PythonAstREPLTool\n",
    "# Create a tool to execute Python code\n",
    "# The tool will have access to the dataset loaded in the local variable `df`\n",
    "df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')\n",
    "tool = PythonAstREPLTool(locals={\"df\": df})\n",
    "tool.invoke(\"df['SeniorCitizen'].mean()\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f2935946",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(64.76169246059918)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['MonthlyCharges'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad0346bf",
   "metadata": {},
   "source": [
    "- 创建LangChain工作流并绑定内置工具\n",
    "\n",
    "\n",
    "初始化一个聊天模型（LLM），这里使用的是deepseek-chat。\n",
    "将之前创建的tool绑定到LLM上，创建一个带有工具链的LLM实例。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "df617cb5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_0_582f48a6-02e4-49a6-ac5a-70f5349fd9a2', 'function': {'arguments': '{\"query\":\"df[\\'MonthlyCharges\\'].mean()\"}', 'name': 'python_repl_ast'}, 'type': 'function', 'index': 0}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 29, 'prompt_tokens': 173, 'total_tokens': 202, 'completion_tokens_details': None, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 128}, 'prompt_cache_hit_tokens': 128, 'prompt_cache_miss_tokens': 45}, 'model_name': 'deepseek-chat', 'system_fingerprint': 'fp_8802369eaa_prod0623_fp8_kvcache', 'id': '2ce8713a-7a5b-4bc4-8936-76d44af2ba9f', 'service_tier': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run--11de2d92-92c2-4c93-bca1-7ca7734a5540-0', tool_calls=[{'name': 'python_repl_ast', 'args': {'query': \"df['MonthlyCharges'].mean()\"}, 'id': 'call_0_582f48a6-02e4-49a6-ac5a-70f5349fd9a2', 'type': 'tool_call'}], usage_metadata={'input_tokens': 173, 'output_tokens': 29, 'total_tokens': 202, 'input_token_details': {'cache_read': 128}, 'output_token_details': {}})"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv \n",
    "load_dotenv(override=True)\n",
    "\n",
    "from langchain_core.messages import AIMessage, HumanMessage, SystemMessage\n",
    "from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder\n",
    "from langchain.chat_models import init_chat_model\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "\n",
    "model  = init_chat_model(model=\"deepseek-chat\", model_provider=\"deepseek\")\n",
    "\n",
    "llm_with_tools = model.bind_tools([tool])\n",
    "\n",
    "response = llm_with_tools.invoke(\n",
    "    \"我有一张表，名为'df'，请帮我计算MonthlyCharges字段的均值。\"\n",
    ")\n",
    "response"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69e28232",
   "metadata": {},
   "source": [
    "- 添加输出解析器并处理查询\n",
    "\n",
    "创建一个输出解析器，用于解析LLM工具链的输出。\n",
    "将解析器绑定到LLM工具链上。\n",
    "使用新的工具链处理相同的查询。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d5031411",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'query': \"df['MonthlyCharges'].mean()\"}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.output_parsers.openai_tools import JsonOutputKeyToolsParser\n",
    "\n",
    "parser = JsonOutputKeyToolsParser(key_name=tool.name, first_tool_only=True)\n",
    "# Bind the parser to the LLM with tools\n",
    "llm_chain = llm_with_tools | parser\n",
    "# Invoke the chain with a query\n",
    "# The response will be a dictionary with the tool's output\n",
    "llm_chain.invoke(\"我有一张表，名为'df'，请帮我计算MonthlyCharges字段的均值。\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "558646be",
   "metadata": {},
   "source": [
    "- 使用处理链处理不同查询\n",
    "\n",
    "定义一个系统提示，告知用户可以使用df数据框和df.head().to_markdown()查看数据。\n",
    "创建一个聊天提示模板，包含系统提示和用户问题占位符。\n",
    "创建一个完整的处理链，从提示模板到LLM工具链再到输出解析器。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "3abc02b5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'query': \"df['MonthlyCharges'].mean()\"}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "system = f\"\"\"\n",
    "你可以访问一个名为 `df` 的 pandas 数据框，你可以使用df.head().to_markdown() 查看数据集的基本信息， \\\n",
    "请根据用户提出的问题，编写 Python 代码来回答。只返回代码，不返回其他内容。只允许使用 pandas 和内置库。\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate([\n",
    "    (\"system\", system),\n",
    "    (\"user\", \"{question}\")\n",
    "])\n",
    "code_chain = prompt | llm_with_tools | parser\n",
    "code_chain.invoke({\"question\": \"请帮我计算MonthlyCharges字段的均值。\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c0e6a0e1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "np.float64(64.76169246059918)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain = prompt | llm_with_tools | parser | tool\n",
    "chain.invoke({\"question\": \"请帮我计算MonthlyCharges字段的均值。\"})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f800d74c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(Churn         No       Yes\n",
       " gender                    \n",
       " Female  0.730791  0.269209\n",
       " Male    0.738397  0.261603,\n",
       " Churn                No       Yes\n",
       " SeniorCitizen                    \n",
       " 0              0.763938  0.236062\n",
       " 1              0.583187  0.416813,\n",
       " SeniorCitizen         0         1\n",
       " gender                           \n",
       " Female         0.837156  0.162844\n",
       " Male           0.838537  0.161463)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke({\"question\": \"请帮我分析gender、SeniorCitizen和Churn三个字段之间的相关关系。\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "073861d1",
   "metadata": {},
   "source": [
    "- 添加代码打印节点\n",
    "\n",
    "定义一个函数code_print，用于打印即将运行的Python代码。\n",
    "创建一个RunnableLambda节点，包装code_print函数。\n",
    "将打印节点添加到处理链中，形成一个新的处理链。\n",
    "使用新的处理链处理一个查询，打印即将运行的代码。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "79d9e9bd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "即将运行Python代码: df['MonthlyCharges'].mean()\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "np.float64(64.76169246059918)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain_core.runnables import RunnableLambda\n",
    "\n",
    "def code_print(res):\n",
    "    print(\"即将运行Python代码:\", res['query'])\n",
    "    return res\n",
    "\n",
    "print_node = RunnableLambda(code_print)\n",
    "\n",
    "print_code_chain = prompt | llm_with_tools | parser | print_node | tool\n",
    "print_code_chain.invoke({\"question\": \"请帮我计算MonthlyCharges字段的均值。\"})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
